Towards Web Mining of Query Translations for Cross-Language Information Retrieval in Digital Libraries
نویسندگان
چکیده
This paper proposes an efficient client-server-based query translation approach to allowing more feasible implementation of cross-language information retrieval (CLIR) services in digital library (DL) systems. A centralized query translation server is constructed to process the translation requests of cross-lingual queries from connected DL systems. To extract translations not covered by standard dictionaries, the server is developed based on a novel integration of dictionary resources and Web mining methods, including anchor-text and search-result methods, which exploit huge amounts of multilingual and wide-scoped Web resources as live bilingual corpora to alleviate translation difficulties, and have been proven particularly effective for extracting multilingual translation equivalents of query terms containing proper names or new terminologies. The proposed approach was implemented in a query translation engine called LiveTrans, which has been shown its feasibility in providing efficient English-Chinese CLIR services for DL.
منابع مشابه
Exploiting the Web as the multilingual corpus for unknown query translation
Users’ cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this paper, we investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. We propose a Web-based term transla...
متن کاملLiveTrans-Cross-Language Web Search through Live Mining of Query Translations
Enabling users to find effective translations automatically for query terms not included in dictionary is one of the major goals of a practical cross-language Web search service. This paper presents a cross-language Web search system called LiveTrans, which is an experimental metasearch engine that provides English-Chinese cross-lingual retrieval of both Web pages and images. The system has bee...
متن کاملA Transitive Model for Extracting Translation Equivalents of Web Queries through Anchor Text Mining
One of the existing difficulties of cross-language information retrieval (CLIR) and Web search is the lack of appropriate translations of new terminology and proper names. Different from conventional approaches, in our previous research we developed an approach for exploiting Web anchor texts as live bilingual corpora and reducing the existing difficulties of query term translation. Although We...
متن کاملDomain-Specific Query Translation for Multilingual Access to Digital Libraries
Accurate high-coverage translation is a vital component of reliable cross language information access (CLIR) systems. This is particularly true of access to archives such as Digital Libraries which are often specific to certain domains. While general machine translation (MT) has been shown to be effective for CLIR tasks in information retrieval evaluation workshops, it is not well suited to spe...
متن کاملA System to Mine Large-Scale Bilingual Dictionaries from Monolingual Web Pages
This paper describes a system that automatically mines EnglishChinese translation pairs from large amount of monolingual Chinese web pages. Our approach is motivated by the observation that many Chinese terms (e.g., named entities that are not stored in a conventional dictionary) are accompanied by their English translations in the Chinese web pages. In our approach, candidate translations are ...
متن کامل